Given the joint chances of a pair of random variables one can computequantities of interest, like the mutual information. The Bayesian treatment ofunknown chances involves computing, from a second order prior distribution andthe data likelihood, a posterior distribution of the chances. A commontreatment of incomplete data is to assume ignorability and determine thechances by the expectation maximization (EM) algorithm. The two differentmethods above are well established but typically separated. This paper joinsthe two approaches in the case of Dirichlet priors, and derives efficientapproximations for the mean, mode and the (co)variance of the chances and themutual information. Furthermore, we prove the unimodality of the posteriordistribution, whence the important property of convergence of EM to the globalmaximum in the chosen framework. These results are applied to the problem ofselecting features for incremental learning and naive Bayes classification. Afast filter based on the distribution of mutual information is shown tooutperform the traditional filter based on empirical mutual information on anumber of incomplete real data sets.
展开▼